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The Devereux Student Strengths Assessment Mini (DESSA-Mini) (LeBuffe, Shapiro, & 
Naglieri, 2014) efficiently monitors the growth of Social-Emotional Competence (SEC) 
in the routine implementation of Social Emotional Learning programs. The DESSA- 
Mini is used to assess approximately half a million children around the world. Since 
behavior rating scales can have ‘rater bias’, this paper examines rater characteristics that 
contribute to DESSA-Mini ratings. Rater characteristics and DESSA-Mini ratings were 
collected from elementary school classroom teachers (n=72) implementing TOOLBOX 
in a racially/ethnically diverse California school district. Teachers rated 1,676 students, 
who scored similarly to a national reference group. Multilevel modeling analysis showed 
that only 16% of variance in DESSA-mini ratings was attributable to raters. 
Relationships between teacher characteristics and ratings were estimated to examine 
rater variance. Collectively, four characteristics of teachers (perceived barriers to student 
learning, sense of their ‘typical’ student’s level of SEC, anticipation of SEL program 
implementation challenges, and intentions to fully implement a newly adopted SEL 
program) accounted for bias in teacher-generated DESSA scores, leaving only 10% of 
the variance unexplained. Identified sources of ‘rater bias’ can be controlled for in 
research and addressed through thoughtful program selection, training, and 
implementation. 
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Introduction 

Nearly 20% of youth in the United States have a mental, emotional, or behavioral problem (Kessler 
et. al, 2012). The presence of a mental, emotional, or behavioral problem makes it less likely that a young 
person will reach important developmental and social milestones of adolescence, which in turn increases the 
likelihood of problems in adulthood (Copeland, Wolke, Shanahan, & Costello, 2015). Mental, emotional, and 
behavioral disorders and their consequences to society cost the United States roughly $247 billion annually 
(O'Connell, et al., 2009). This cost does not include the personal hardship experienced by each individual 
child and family challenged to navigate a complex social environment without the tools to do so successfully. 

Longitudinal research has identified reliable predictors of youth mental, emotional, and behavioral 
problems (Catalano et al., 2012). These predictors serve as clues as to what characteristics and experiences 
disrupt typical youth development and what skills and supports children need to succeed. To promote positive 
youth development, communities act intentionally (‘intervene’ ) in hopes of reducing children’s experiences of 
adversity (reducing ‘risk factors’) while augmenting children’s strengths (increasing ‘protective factors’). 
Findings from resilience research have revealed that most children have both intrinsic and learned capacities 
to overcome the adversities they face (Masten, 2014). Social Emotional Learning (SEL) interventions in 
schools are intended to uncover, recognize, and nurture these endemic capacities in children, disrupting 
trajectories toward problem occurrence, and strengthening their prospects for school and life success. An 
emerging science demonstrates that SEL programs can impact a broad array of important child outcomes, 
such as preventing aggression, anxiety, bullying, conduct problems, delinquency, drug use, and truancy, while 
promoting emotional regulation, prosocial skills, and academic achievement (Abbott, et al., 1998; 
Domitrovich, Cortes, & Greenberg, 2007; Durlak, Weissberg, Dymnicki, Taylor, & Schellinger, 2011; 
Espelage, Rose, & Polanin, 2015; Flay & Allred, 2003; Greenberg et al., 2003). 

In order to progress our knowledge about whether specific SEL programs work, for whom, and under 
what conditions, we need psychometrically sound assessment tools that allow us to observe the impact of the 
intervention on the growth of protective factors (Naglieri, LeBuffe, & Shapiro, 2013). Such tools, if practical 
enough for routine use, can also facilitate the high-quality implementation of SEL programs in multiple ways. 
For example, initial assessment can help teachers and student service personnel identify students with the 
greatest strengths and needs to target with interventions (Naglieri, LeBuffe, & Shapiro, 2011). Also, repeated 
assessment can determine whether the SEL intervention is having its intended impact on students in real-time, 
or if changes to the nature, intensity, or implementation quality of the intervention need to be made 
(Simmons, Shapiro, Accomazzzo, &Manthey, 2016). This type of monitoring is particularly useful in regions 
where SEL programs tend to be either untested or imported from other contexts and adapted for local 
populations and service settings (Perez -Gomez, Mejia-Trujillo, Brown, & Eisenberg, 2016). 

The Devereux Student Strengths Assessment (DESSA) Mini (Naglieri, LeBuffe, & Shapiro, 2014) 
was designed to overcome obstacles to screening and monitoring the growth of Social-Emotional Competence 
in the routine implementation of SEL programs (Maras, Thompson, Lewis, Thornburg, & Hawks, 2015). 
With only 8 items, the DESSA-Mini is a behavior rating scale that can be completed by teachers and out-of- 
school time program staff in just one minute (Shapiro, Kim, Robitaille, & LeBuffe, 2016). This strength- 
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based assessment system, which includes four interchangeable brief forms and a longer full assessment 
(LeBuffe, Shapiro, &Naglieri, 2014), is now being used to assess approximately a half million children each 
year in the United States, and in countries such as Australia, Canada, Mexico, Qatar, South Africa, and the 
United Kingdom. The English and Spanish language instruments, normed on a representative sample of youth 
aged 5 to 14 in the United States, have also been translated (e.g. Italian Edition; LeBuffe, Shapiro, &Naglieri, 
2015), normed, and culturally adapted (e.g. Dutch Adaptation; LeBuffe, Shapiro, Naglieri, Pont, & Punt, 
2013) for use in other countries. The instruments are being used by researchers in various regions of the globe 
to determine the effectiveness of SEL interventions; examples include the Random Acts of Kindness 
Curriculum (PI: Kimberly Schonert-Reichl) in Canada and the Cool to Be Me Programme (PI: Linda Bruce) 
in South Africa (SEL Consulting, 2015). 

The DESSA-Mini uses a format that is common to many behavior rating scales, measuring the 
frequency of a student’s behavior relative to a standardized reference group. The DESSA-Mini is completed 
by indicating, for each item, how often in the past four weeks the student performed a specific positive 
behavior. Items are converted to a '/'-score, referred to as the Social Emotional Total. 

Behavior rating scales like the DESSA-Mini have many perceived benefits (Shapiro, Accomazzo, 
Claassen, & Fleming, 2015). They can be used to efficiently collect information about behavior performance 
across settings, from multiple informants, and over multiple time points. They tend to have broad coverage 
and are somewhat more practical to administer, score, and interpret compared to other data collection options 
(e.g., direct observation) (McKown, 2015). These are important advantages for supporting primary prevention 
programs (LeBuffe & Shapiro, 2004). Yet, behavioral rating scales have also been criticized for their potential 
to incorporate rater bias into assessment scores because each item requires interpretation, reflection, and 
judgment by the rater (Elliot, Frey, & Davies, 2015; Hoyt & Kerns, 1999). In other words, behavior rating 
scale scores are likely to reflect characteristics of the rater as well as the student being rated (Hoyt, 2000). 

Rater bias is a form of non-random measurement error, or systematic variance that is attributable to 
the rater (Hoyt & Kerns, 1999). Rater bias may artificially inflate or suppress assessment scores relative to the 
actual frequency of behavior. A large amount of rater bias is problematic in practice settings because scores 
could be less precise than are desired for clinical and educational decision making. A large amount of rater 
bias is also problematic in research because it reduces the capacity to fully estimate (and ultimately detect) 
relationships between variables. Although systematic variance is difficult to observe in routine practice, it can 
be corrected once it has been identified (Mason, Gunersel, & Ney, 2014). Sources of rater bias are important 
to uncover to ensure scores that inform decision-making in both the research and practice realms are reliable, 
valid, and equitable. 

Rater bias comes in two forms: dyad-specific variance and rater-specific variance (Hoyt & Kerns, 
1999). Dyad-specific variance is inherently about the interaction between the rater and the student being 
rated, which occurs when a rater has a different reaction to particular students. Specifically, a characteristic of 
the student (e.g., disability), unrelated to the construct being measured, influences the way the adult rates the 
student. Studies have experimentally-induced rater bias by varying the gender or diagnostic label assigned to 
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children in the assessment processes (e.g., Foster &Ysseldyke, 1976; Kelter & Pope, 2011), but a randomized 
design is used to eliminate the effect of individual rater differences rather than examine them. 

Rater-specific variance reflects rater differences that are consistent across targets (Hoyt, 2000). Put 
simply, different raters can react to the same questions differently, regardless of the student they are rating. 
The average rating from one teacher may deviate from the average rating across all teachers in ways that are 
predictable, reflecting how the rater generally perceives students in the domain being assessed (e.g., Social- 
Emotional Competence) or reacts to the assessment prompt, items, or response choices. If a teacher’s average 
rating trends positive, the rater is said to be lenient (Ford, 1931). If a teacher’s average rating trends negative, 
the rater is said to be severe. Understanding the nature and source of leniency and severity errors could inform 
score interpretation in research and practice. 

Rater variance in the assessment of Social-Emotional Competence is difficult to explore in routine 
practice where there is usually only one rater per student, and the obtained score is treated as the ‘true’ score. 
Assessment developers often conduct small inter-rater reliability studies to broadly understand the extent to 
which a pair of raters agree in their assessment of the same child (Gresham, Cook, Vance, Elliott, & Kettler, 
2010). Inter-rater reliability studies of the DESSA-Mini, for example, have shown correlations that range 
from .70-.81 across the 4 forms, and scores that differ, on average, by 0-.60 7'-scorc points (Naglieri, et al., 
2014). Studies like these provide evidence that behavior rating scale scores do reflect characteristics of the 
rater, to some extent, in addition to the student being rated. On the other hand, these studies do not reveal the 
source of the bias, or clarify how one might address it when interpreting or using the scores. 

Given that there is no consensus indicator for ‘true’ levels of student Social-Emotional Competence, 
validity studies that attempt to discover which teacher has the more ‘correct’ perception are not tenable at this 
time. Alternatively, we can use statistical techniques to determine the extent to which a given teacher’s ratings 
of his or her students, on average, deviate from all other teachers’ ratings of their students. Teacher 
characteristics that predict these deviations can be considered sources of rater bias. 

Although questions about how teacher perceptions impact ratings arise frequently in practice settings, 
a recent review (Schultz & Evans, 2012) found only 37 articles on the topic. Mason and colleagues (2014) 
note that most of the articles written about rater bias are conceptual and “do not offer quantifiable evidence of 
mean differences directly attributable to teacher characteristics of beliefs” (p. 1019). Additionally, they argue 
that, given the large number of teacher variables that may influence behavior ratings, initial inquiries need to 
look through one lens at a time. The current paper examines potential sources of rater-specific biases in rating 
student Social-Emotional Competence through the lens of implementation science (the systematic study of 
implementation). 

Implementation is a term used to describe the activities designed to put an intervention into practice. 
A central lesson from implementation science is the importance of the program implementors to the ultimate 
success of an intervention (Elias, Zins, Graczyk, &Weissberg, 2003). Examining ‘rater bias’ through the lens 
of implementation science encourages us to understand the contributions of the rater as an essential part of the 
assessment, intervention, and evaluation process rather than overlooking or isolating them as ‘noise’ in 
measurement. 
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There is a burgeoning literature on contextual variables that impact the implementation of SEL 
programs in schools (Elias, 2007; Fagan, Hawkins, & Shapiro, 2015; Greenberg, Domitrovich, Graczyk, & 
Zins, 2005). Durlak & DuPre’s (2008) systematic review of this literature identified teacher characteristics 
consistently associated with implementation success: perceptions that the intervention is needed, expectations 
that the intervention will be beneficial, and having the requisite skills and confidence to do what is expected. 
Additional organizational factors identified with implementation success included a positive work climate and 
staff norms regarding change. It may be that the same characteristics that predict implementation success also 
predict the ways in which teachers rate student behavior. 

This paper seeks to determine whether teacher characteristics that impact the successful 
implementation of SEL programs are similar to those that explain rater bias in the assessment of student 
Social-Emotional Competence. Specifically, this study examines the extent to which DESSA-Mini ratings are 
affected by teacher attitudes, capacities, and expectations, perceptions of implementation, impact, and school 
climate, and finally, what they generally perceive to be the levels of Social-Emotional Competence within 
themselves and others. Each of these teacher characteristics was hypothesized to reveal a potential leniency or 
severity error in the completion of the DESSA-Mini. 

Methods 

Study and Data Description 

The TOOLBOX Implementation Research Project is a quasi-experimental study of TOOLBOX 
(Collin, 2015), a commonly used Social-Emotional Learning (SEL) program aimed at enhancing Social- 
Emotional Competence among students in Kindergarten through 6th grade. TOOLBOX provides a common 
language to guide school and family support for children’s social and emotional development through the 
instruction and application of 12 tools (e.g., the Breathing Tool, the Garbage Can Tool). Developed to be an 
inherently practical SEL intervention, TOOLBOX strives to augment approaches that are natural to teachers 
and caregivers to reveal tools endemic to children. Specifically, TOOLBOX seeks to foster self-awareness, 
social-awareness, self-management, decision-making, and relationship skills in children through explicit 
lesson plans, classroom and school-wide strategies, and integration/reinforcement at home. TOOLBOX has 
been widely implemented in Northern California school districts and has been explored in two studies. The 
West Costa County Unified School District Evaluation (Dovetail Learning, 2013) found that teachers reported 
using and valuing TOOLBOX on a post-intervention survey. The Sonoma County Collaboration for Resilient 
Children pre/post evaluation found that, after just four months of using TOOLBOX, teachers and yard aids 
perceived significantly higher emotional and behavioral strengths (i.e., interpersonal, intrapersonal, and 
affective strengths) in children, relative to baseline (DeLong-Cotty, 2011). 

The current study features the implementation of TOOLBOX within one Northern California School 
District. This district served 10,982 students during the 2014-2015 academic year (District, 2016). Six 
elementary schools were each assigned to one of the following conditions; (1) the TOOLBOX ‘standard’ 
implementation - a higher-dosage condition which included full TOOLBOX lesson plans, a compendium of 
TOOLBOX strategies and practices, and a full complement of material resources, (2) the TOOLBOX ‘primer’ 
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implementation - a lower-dosage condition which included only the most essential TOOLBOX strategies and 
practices, and only brief introductory lessons to the TOOLBOX tools, without the benefit of full lesson plans 
or material resources, and (3) a measurement-only comparison condition. 

The four elementary schools assigned to implement the Standard or Primer versions of TOOLBOX 
serve a racially and ethnically diverse student body (53% Hispanic/Latino(a), 16% Asian/Asian American, 
13% Black/African American, 8% White/European American, 7% Filipino, and 3% Other) with 42% of 
students primarily speaking a language other than English (e.g., Spanish, Cantonese, Mandarin, Tagalog, 
Vietnamese, Arabic) in their homes (District, 2016). Close to 70% of students had a household income of less 
than $44,123 annually for a family of four. In 2015, students meeting or exceeding the state educational 
standards in Language Arts/Literacy was 27% and in Mathematics was 24%. 

Five days prior to the start of instruction for the 2015-2016 school year, teachers and staff from the 
four elementary schools using the Standard or Primer versions of TOOLBOX received a six-hour training to 
prepare them to implement TOOLBOX. Data were collected before and after the training to learn about the 
teachers and their teaching environment, collect their feedback on the training, and understand their 
expectations for program implementation. Of the 101 classroom teachers in schools implementing 
TOOLBOX, 94% attended the training. Of the classroom teachers in attendance, 76% completed a pre- 
training survey and 75% completed a post-training survey. With 99% of survey participants consenting for 
their responses to be used in research, the analysis sample for this paper became 72 classroom teachers. 

During October of 2015 (29-34 days of instruction into the school year), classroom teachers assessed 
their students’ Social-Emotional Competence using the Devereux Student Strengths Assessment (DESSA) 
Mini (Naglieri et al., 2014), a brief 8-item universal screening and progress monitoring tool. The 72 teachers 
in this study, each completed the DESSA-Mini on an average of 23 students (range 4-31), collectively 
completing DESSA-Minis on 1,676 students. At this time, teachers also completed an SEL Programming 
Survey. Of the 72 teachers in the analysis sample, 70 completed the October SEL Programming Survey. The 
university human subjects Institutional Review Board approved all research processes. 

Sample 

The sample in this study is described with valid percentages (see Table I). It includes 72 credentialed teachers 
who taught students in transitional-kindergarten through 5th grade. The majority provided general education 
instruction in English. Although 13% of teachers were new to the district this year, 46% had worked in the 
district for more than 10 years. Of those who provided a response to the question about their racial/ethnic 
identity (69 teachers, or 96%), 59% identified as White/European American, 12% as Asian/Asian American, 
12% as Hispanic/Latino(a), 7% as multi-race, 6% as Black/African American, and 4% as other. 
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Table I. Additional Teacher Demographics 



# of 

responses 

% 


# of 

responses 

% 

Gender (n = 72) 



Eagerness to adopt new school initiative (n = 62) 



Male 

5 

6.9 

Not eager at all 

- 

- 

Female 

67 

93.1 

Slightly eager 

2 

3.2 

First Generation College Graduate (n = 

72) 



Somewhat eager 

12 

19.4 

Yes 

29 

40.5 

Eager 

31 

50.0 

No 

43 

59.7 

Very Eager 

17 

27.4 

Grades taught (n = 72) 



Preference for trying a new school initiative (n = 
62) 



Transitional kindergarten 

2 

2.8 

Highly structured initiatives 

3 

4.8 

Kindergarten 

10 

13.9 

Highly structured first, then flexible initiatives 

25 

40.3 

1st Grade 

16 

22.2 

Highly flexible first, then structured initiatives 

19 

30.7 

2nd Grade 

13 

18.1 

Highly flexible initiatives 

11 

17.7 

3rd Grade 

10 

13.9 

No preference 

4 

6.5 

4th Grade 

10 

13.9 

Training quality ratings (n = 62) 



5th Grade 

11 

15.3 

Excellent 

12 

37.1 

Primary language used in instruction (n = 
72) 



Good 

33 

53.2 

English 

70 

97.2 

Fair 

5 

8.1 

Spanish 

1 

1.4 

Poor 

1 

1.6 

English & Spanish equally 

1 

1.4 

Very Poor 

- 

- 

Teach Special Education (n = 72) 



Years worked in the district (n = 72) 



Yes 

4 

5.6 

<1 year 

9 

12.5 

No 

68 

94.4 

1 -2 years 

9 

12.5 

Live in the district (n -72) 



3-5 years 

10 

13.9 

Yes 

14 

19.4 

6-10 years 

11 

15.3 

No 

58 

80.6 

11-20 years 

24 

33.3 




20+ years 

9 

12.5 
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Teachers in this sample reported that they are generally eager or very eager (77%) to adopt new 
initiatives at school. When asked about their general preferences for rolling out a new initiative at school, 
40% of teachers preferred initial structure with increasing flexibility, 31% preferred initial flexibility with 
increasing structure, 18% preferred highly flexible at all times, and 5% preferred highly structured at all 
times. About 6% of teachers reported no preference. Prior to the August TOOLBOX training, no teacher had 
ever used TOOLBOX, but 13% had observed TOOLBOX in practice and 8% had attended a previous 
TOOLBOX training. At the end of the training, 90% of teachers rated the training quality as good (53%) or 
excellent (37%). 

Measures 

Social-Emotional Competence. The DESSA-Mini Form 1 (Naglieri et al., 2014) includes eight items 
that ask the raters the frequency (never = 0, rarely = 1, occasionally = 2, frequently = 3, very frequently = 4) 
of observed positive behaviors of the child in the past four weeks. The 8-items (a = 0.95) are summed to 
create a Raw Score Total. The Raw Score Total is then converted into a standardized 7-score (M = 50, SD = 
10) based on the national norms, yielding the Social Emotional Total (SET). High SET scores (T-scores of 60 
and above) are a strength, SET scores between 41 and 59 (inclusive) are typical, and low SET scores (T- 
scores of 40 and below) point to a need for instruction. The U.S. norm sample has been independently 
reviewed and judged as representative and sufficiently large for interpretation of this nature (Merrell & 
Gueldner, 2010). 

Teacher attitudes. At training, teachers reported the importance of Social-Emotional Competence to 
school success (unimportant = 1 to essential = 5), and their eagerness to use TOOLBOX (not eager at all =1 to 
very eager =5). 

Teacher capacities. At training and in October, teachers reported the extent to which they felt 
informed (uninformed = 1 to very informed = 5) about TOOLBOX and confident (no confidence = 1 to very 
confident = 5) in their capacity to implement TOOLBOX. Teachers were also asked to state the ‘tagline’ (i.e., 
mantra or slogan) associated with a Tool to directly assess their knowledge about TOOLBOX (0 = incorrect; 
1 = correct). 

Teacher expectations. At training, teachers reported to what extent they: (1) personally anticipated 
implementing TOOLBOX relative to others at school (least fully = 1, most fully = 10), (2) believed 
TOOLBOX would benefit students (no benefit = 1, very beneficial =5), and (3) anticipated challenges in 
implementing TOOLBOX (low challenge = 1 to high challenge = 5). 

Teacher perceptions of implementation and impact. In October, teachers reported to what extent they: 
(1) were implementing TOOLBOX relative to others at school (least fully = 1, most fully = 10), and (2) 
believed TOOLBOX has benefited their students (no benefit = 1, very beneficial =5) 

School climate. In October, teachers reported the extent to which they perceived barriers to student 
learning, experienced barriers to providing effective instruction, experienced stress at work, and experienced 
conflict at work (very low = 0 to very high = 4). Furthermore, teachers reported (very poor = 0 to great = 4) 
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on the overall learning (“I would describe our school as a_place for students to learn”) and working (“I 

would describe our school as a_place for adults to work”) environment of the school. 

Social-Emotional Competence (SEC) in self, a typical colleague, and a typical student. In October, 
teachers reported (very low = 0 to very high =4) their own SEC (“Social-Emotional Competence refers to an 
awareness of, and ability to manage emotions in, a context-appropriate manner. How do you think your 
colleagues would rate your social-emotional competence, as it shows up at school?”); that of a typical 
colleague (“How would you rate the SEC of the typical colleague you work with at school?”) and that of a 
typical student they teach ("Think of a child that is fairly representative of the children with whom you work. 
How would you rate the Social-Emotional Competence of this child?"). 

Analysis 

In order to account for clustering in the data and to address missing data (2.8%-26% across all 
predictor variables, see Table II), hierarchical linear modeling with maximum likelihood estimation (Rabe- 
Hesketh & Skrondal, 2012) was used to estimate the relationship between teachers’ ratings of student Social- 
Emotional Competence (DESSA-Mini scores; level one) and teachers’ self-reported characteristics and 
perceptions (from pre-training, post-training, and October SEL Programming surveys; level two). First, to 
identify specific teacher characteristics and perceptions that contribute to teachers’ DESSA-Mini ratings, each 
predictor was added to the null model individually. Then, the significant predictors of teacher ratings were 
included in the final model to estimate their joint contribution to explaining rater bias in this data. 
Correlations and paired t-tests were used to examine relationships between variables across time points. All 
analyses were conducted using Stata 7 (Statacorp, 2001). 

Results 

Social-Emotional Competence 

The average DESSA-Mini SET score was 50.88 ( SD = 11.74). One fourth of the students received 
scores of 60 and above (strength), 57% received scores between 41 and 59 (typical), and 18% received scores 
of 40 and below (need for instruction). Approximately 16% (ICC = .16) of the variance in scores was 
attributable to teacher raters. 

Bivariate Relationships between Student Social-Emotional Competence and Teacher Characteristics 

Teacher attitudes. Before training, nearly all teachers believed that Social-Emotional Competence 
was very important (39%) or essential (60%) to school success and 60% were eager or very eager to 
implement TOOLBOX. After training, all teachers believed that Social-Emotional Competence was very 
important (33%) or essential (67%) to school success and 83% were eager or very eager to implement 
TOOLBOX. While teachers’ attitude towards Social-Emotional Competence started high and remained 
statistically unchanged (t(60) = -1.63, p = .11), their eagerness to implement TOOLBOX was significantly 
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higher after training (7(60) = -4.44, p < .001). No measure of teacher attitudes significantly predicted DESSA- 
Mini ratings (see Table II). 

Teacher capacities. Before training, very few teachers (3%) felt ‘sufficiently’ or ‘very’ informed 
about TOOLBOX. After training this was significantly higher; most teachers (82%) felt ‘sufficiently’ or 
‘very’ informed about TOOLBOX (7(61) = -16.72, p < .001). In addition, after training, 88% of teachers felt 
confident or very confident in their capacity to implement TOOLBOX. However, when asked to state the 
tagline associated with one of the Tools, only 35% of teachers provided a correct answer. There was no 
detectable relationship between feeling informed (r =.11, p = .45) or confident (r = .11, p - .41) and 
knowledge of the TOOLBOX tagline. Approximately 7 weeks after training, with implementation underway, 
significantly fewer teachers (20%) felt confident or very confident in their capacity to use TOOLBOX (7(61 )= 
11.33, p < .001), but a comparable number of teachers (46%) provided the correct tagline (? = -.42, p = .81). 
In October, there still was no detectable relationship between confidence and knowledge (r = .09, p = .49). No 
teacher capacities (the extent to which teachers were informed, confident, or knowledgeable) measured at 
training or in October significantly predicted DESSA-Mini ratings in October. 

Teacher expectations. At the end of training, teachers had high expectations to fully implement 
TOOLBOX (M = 7.28, SI) = 1.45). Teachers expected, on average, a moderate degree of challenge 
implementing TOOLBOX (M = 1.84, SD = .77). Before training, 34% of teachers expected TOOLBOX to be 
very beneficial to their students. After training, teacher expectations were significantly higher; 71% of 
teachers expected TOOLBOX to be very beneficial to their students (7(57) = -5.76, p < .001). 

At the end of training, the extent to which teachers expected to fully implement TOOLBOX 
significantly predicted their DESSA-Mini ratings (b = 1.00, p = .04). In addition, the extent to which teachers 
anticipated challenges to TOOLBOX implementation predicted DESSA-Mini ratings (b = -2.46, p = .003). 
The extent to which teachers expected TOOLBOX would benefit students at the end of training marginally 
predicted their DESSA-Mini ratings (b = 1.90, p = .09). 

Teacher perceptions of implementation and impact. In October, teachers reported moderate levels of 
implementation (M = 5.95, SD = 2.18), significantly lower than their expectation at the end of training (7(56) 
= 5.01, p < .001). Seventy seven percent of teachers agreed or strongly agreed that TOOLBOX had benefited 
their students. The benefits they perceived during implementation were significantly higher than the benefits 
they expected at the end of training (7(55) = 6.88, p < .001). Neither teacher perceptions of implementation 
nor teacher perceptions of impact significantly predicted their concurrent DESSA-Mini ratings (See Table II). 

School climate. In October, 80% of teachers perceived their school to be a good or great place for 
students to learn. However, 68% reported that the barriers to student learning were high or very high. 
Teachers generally reported (76%) that their school was a good or great place to work, but 64% reported that 
their stress level at work was high or very high (although only 24% experienced high or very high levels of 
conflict or tension at work). In October, teachers’ perception of barriers to student learning significantly 
predicted concurrent teachers’ DESSA-Mini ratings (h = -1.60, p = .04). No other concurrent measures of 
school climate significantly predicted teachers’ DESSA-Mini ratings. 
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Table II. Bivariate Relationships between Teacher Characteristics and DESSA-Mini Ratings 1 


DESSA-Mini Ratings 

# of 

responses 

B 

SE 

P 

Pre-Training Survey 





Teacher Attitudes 

Importance of SEL instruction for school 
success 

62 

-0.10 

1.31 

0.94 

Eagerness to use TOOLBOX 

62 

0.58 

0.73 

0.43 

Teacher Capacities 

Informed about TOOLBOX 

63 

0.12 

0.82 

0.89 

Teacher Expectations 

Extent to which TOOLBOX will benefit 

59 

0.96 

0.86 

0.26 


Post-Training Survey 
Teacher Attitudes 

Importance of SEL Instruction for school 
success 

Eagerness to use TOOLBOX 
Teacher Capacities 
Informed about TOOLBOX 
Confidence in own capacity to use TOOLBOX 
Knowledgeable about TOOLBOX 
Teacher Expectations 
Extent of challenges in implementing 
TOOLBOX 

Intent to fully implement TOOLBOX 
Extent to which TOOLBOX will benefit 
students 


63 

0.71 

1.42 

0.62 

63 

0.30 

0.94 

0.75 

63 

0.21 

0.92 

0.82 

63 

1.22 

0.91 

0.18 

53 

0.30 

1.46 

0.84 

63 

-2.46 

0.82 

<0.001 

61 

1.00 

0.49 

0.04 

59 

1.90 

1.11 

0.09 

69 

0.77 

0.92 

0.40 

63 

0.98 

1.32 

0.46 

66 

0.29 

0.29 

0.32 

64 

1.02 

0.87 

0.24 

70 

-1.60 

0.78 

0.04 

69 

-0.98 

0.69 

0.16 

69 

0.52 

0.79 

0.51 

69 

0.72 

0.57 

0.21 

70 

-0.52 

0.84 

0.54 

69 

-0.77 

0.80 

0.33 

70 

0.10 

0.91 

0.92 

69 

0.10 

0.95 

0.92 

70 

3.35 

0.84 

<0.001 


October SEL Programming Survey 

Teacher Capacities 

Confidence in own capacity to use TOOLBOX 
Knowledgeable about TOOLBOX 
Teacher Perceptions 

Level of full implementation of TOOLBOX 
Extent to which TOOLBOX has benefited 
students 

School Climate 
Barriers to student learning 
Barriers to effective instruction 
Stress level at work 
Conflict level at work 

Perception of learning climate of this school 
Perception of working climate at this school 
Social-Emotional Competence 
Own social-emotional competence 
Typical colleagues’ social-emotional 
competence 

Typical students' social-emotional competence 


Total sample: n = 72 
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Social-Emotional Competence (SEC) in self, colleagues, and students. In October, teachers reported, 
on a scale from 0-4, their own SEC as they imagined others perceived it, the SEC of a ‘typical’ colleague, and 
the SEC of a ‘typical’ student. On average, teachers reported their own SEC ( M = 2.76, SD = .69) to be higher 
than that of a typical colleague ( M = 2.49, SD = .68; 468) = 2.11, p = .008). In fact, 70% of teachers reported 
themselves as having high or very high SEC, while 54% of teachers reported their colleagues as having high 
or very high SEC. When teachers reported the SEC of their ‘typical’ student ( M = 1.69, SD = .69), only 7% 
reported their students as having high or very high SEC. 

Neither teachers’ reports of their own SEC nor their reports of their typical colleagues’ SEC 
significantly predicted DESSA-Mini ratings. However, teacher reports of their ‘typical’ student’s SEC, 
predicted DESSA-Mini ratings ( b = 3.35, p < .001). 

Multivariate Relationship between Student Social-Emotional Competence and Teacher Characteristics 

Four teacher characteristics (each statistically significant in the bivariate models) were included 
together in a model to estimate teachers’ DESSA-Mini ratings (see Table III). Teachers’ higher post-training 
intent to fully implement TOOLBOX (b = .94, p = .03), and teachers’ higher October reports of their ‘typical’ 
student’s SEC (b = 2.79, p = .004) continued to significantly predict higher DESSA-Mini ratings. Teacher’s 
higher post-training anticipation of challenge in implementing TOOLBOX (b = -1.90, p = .02) and higher 
October perception of barriers to student learning ( b = -1.70, p = .02) significantly predicted lower DESSA- 
Mini ratings. A likelihood ratio test confirmed that the full multivariate model better fit the data than the null 
model (x 2 (4) = 28.64, p < .001). Only 10% (ICC = .10) of the variance remained unexplained in the final 
model. 


Table III. Multivariate Relationships between Teacher Characteristics and DESSA-Mini Ratings 



Model 1 


Model 2 



B 

SE p 

B 

SE 

P 

Intercept 

50.49 

0.62 <0.001 

47.42 

4.83 

<0.001 

Intent to fully implement 
TOOLBOX (post-training) 



0.94 

0.42 

0.03 

Anticipated challenge in 

TOOLBOX implementation 
(post-training) 



-1.9 

0.81 

0.02 

Perceived barriers to student 
learning (October) 



-1.7 

0.73 

0.02 

Typical student's social-emotional 
competence (October) 

Intra class 
correlation 


2.79 

0.96 

0.004 

Between teachers 

0.16 


0.1 
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Discussion 

This study explored the extent to which teacher characteristics predicted teacher ratings of student 
Social-Emotional Competence on the DESSA-Mini. The DESSA-Mini is a behavior rating scale being used 
to assess approximately a half million children worldwide. Despite their popularity, behavioral rating scales 
are believed to incorporate rater bias into assessment scores because each item requires interpretation, 
reflection, and judgment by the rater. We found that only a small amount of the variance in DESSA-Mini 
scores was attributable to raters and that 'A of the rater bias (i.e., unexplained variance at the teacher level) 
could be explained by four rater characteristics: teachers’ expectations about their own level of SEL program 
implementation, anticipation of implementation challenges, perceptions of the barriers their students face, and 
perceptions of Social-Emotional Competence among their students. 

Teacher Attitudes, Capacities, and Expectations 

Teachers at training felt that Social-Emotional Competence was important to school success, were 
eager to implement TOOLBOX, and were sufficiently informed and confident in their capacity to implement 
TOOLBOX but neither these attitudes nor capacities biased subsequent DESSA-Mini ratings. It is possible 
that demand characteristics limited the variance in teacher reports of their attitudes and capacities, which 
attenuated the relationships between these teacher attributes and their DESSA-Mini ratings. On the other 
hand, this potential was minimized by having a third party collect the data, rather than the district or the SEL 
program developer. 

After training, teachers had high expectations for their personal implementation of TOOLBOX. 
Higher expectations for implementation at training predicted more lenient DESSA-Mini ratings in October. 
Teacher reports of their actual implementation were lower than their earlier expectations, and did not predict 
their concurrent DESSA-Mini ratings. Furthermore, teachers had high expectations for the impact of 
TOOLBOX after training, and had even higher expectations for benefit seven weeks into implementation, but 
neither predicted DESSA-Mini scores. 

After training, teachers anticipated TOOLBOX implementation to be only moderately challenging. 
Higher anticipation of implementation challenges at training predicted more severe DESSA-Mini ratings in 
October. Teachers were not asked about the extent of the actual challenges that they faced when surveyed in 
October. 

Teacher Perceptions of the School Climate and the Students 

Teachers felt the school learning environment was positive, although they perceived high student 
barriers to learning. Higher teacher perceptions of student barriers to learning predicted more severe DESSA- 
Mini ratings. Teachers also felt the school working environment was positive, despite the high levels of 
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teacher stress. Teachers perceived their colleagues to have above-average Social-Emotional Competence, and 
their own Social-Emotional Competence to be even higher. Neither teacher perception of their own reputation 
for Social-Emotional Competence, nor teacher perceptions of their colleagues’ Social-Emotional 
Competence, biased concurrent teacher DESSA-Mini ratings. 

Although teachers generally perceived themselves and their colleagues to have above-average Social- 
Emotional Competence, they perceived their typical student as having below-average Social-Emotional 
Competence. The DESSA-Mini scores, however, were fairly comparable in this district to the normative 
sample. More students in this district (25%) had strengths, relative to the national norm (16%), and a similar 
number of students in this district (18%) had a need for instruction, relative to the national norm (16%). These 
data suggest that, before doing a formal assessment, the students were underestimated! Given the barriers that 
students in this district face (e.g., 70% low-income status), the level of protective factors is impressive as well 
as important. Higher teacher perceptions of general levels of student Social-Emotional Competence predicted 
more lenient DESSA-Mini ratings. 

Although teachers’ broad-based impressions of their ‘typical’ student explains some variance in 
DESSA-Mini scores, the current study design does not enable us to determine whether perceptions of the 
‘typical’ student (a) shape every DESSA-Mini rating completed, revealing rater bias, or (b) reflects the actual 
amount of Social-Emotional Competence in his or her students astutely and accurately observed by teachers. 
To the extent that this is interpreted as an undesirable bias, replication in other samples could be done to 
determine if the routine collection of this information and a score adjustment is warranted. 

Limitations and contributions 

This study was limited in several respects. First, it was conducted in a single Northern California 
school district, which limits the generalizability of the findings. However, the district includes a diverse 
student body and the sample is described at length to facilitate judgments about the transferability of the 
findings. Second, teachers provided the DESSA-Mini ratings and information about themselves, which could 
create method-bias. However, we think this is appropriate given our research question and frame that teachers 
are central to the assessment and intervention process. Third, requirements for survey brevity prevented us 
from using multi-item scales to assess teacher characteristics. This could increase (a) missingness, potentially 
heightening the risk of sampling error, and (b) measurement error, reducing power to detect effects. Finally, 
studies have reported that timing in the school year matters; bias is higher when raters are unfamiliar with 
assessment tools and when the students are less known to the rater (Evans, Allen, Moore, & Strauss, 2005; 
Hoyt & Kerns, 1999). Future studies should examine the extent of rater bias that exists at the end of the 
school year, and provide guidance for practitioners and evaluators using the behavior rating scale to measure 
change over time. 

An important contribution of this study is clarifying the extent to which DESSA-Mini scores reflect 
characteristics of the rater, in addition to characteristics of the individual student being rated. We find that 
approximately 16% of the variance in DESSA-Mini scores is attributable to the teacher rater. To the best of 
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our knowledge, this is the first study to report this information about the DESSA-Mini, which contains 
somewhat less bias than the 20%-50% of the variance attributable to the rater reported on other tools (Molina, 
Pelham, Blumenthal, Galiszewski, 1998; Phillips & Lonigan, 2010; Schultz & Evans, 2012). It should be 
noted, however, that variance attributed to the teacher could also be attributed to the classroom or school 
environment. Students in the same environment are likely to perform behaviors more similarly to each other 
than students in different environments. Future research with more schools may want to use a 3-level 
statistical model to analyze the variance that can be attributable to the school. 

Implications & Future Directions 

This study has important implications for practice. Overall, TOOLBOX training was well received in 
this district. Training increased the extent to which teachers felt informed, confident, and eager to use the 
program, resulting in high intent to fully implement the program and the expectation for student benefit. Once 
implementation had begun, teachers perceived a greater benefit than they expected, but they reported lower 
levels of capacity and were implementing the program less fully than they planned. This suggests that despite 
the benefits of initial training, potentially due to implementation challenges encountered in routine practice, a 
booster-training or ongoing technical assistance might be useful. 

Furthermore, it may be that the teacher characteristics identified in this study as sources of rater bias 
can be remediated through training, intervention planning, and implementation supports in order to shrink the 
systematic error in student assessments. For example, providing rater training for assessment tools has been 
shown to reduce, although not eliminate, rater bias. In this study raters were provided with as-needed 
technical assistance in the completion of the DESSA-Mini, but did not participate in any of the DESSA 
trainings available through the Devereux Center for Resilient Children. Future studies should explore whether 
training to use the DESSA specifically, or training about rater bias in general (including the sources identified 
in this study), would shrink the extent of rater bias in the assessment process. 

Interesting implications to guide future research also emerged. In this study, the extent to which 
teachers felt informed and confident in their capacity to use TOOLBOX was not associated with the extent to 
which they correctly recalled essential information about the TOOLBOX program. As the best way to assess 
the knowledge of prevention program implementers remains unresolved in literature (Shapiro, Oesterle, & 
Hawkins, 2015), it would be important to understand how teacher perceptions of capacity relate to their actual 
knowledge. Finally, future studies should look for sources of rater bias beyond the field of implementation 
science to explain remaining (unmeasured sources of) variance. Some studies have found that raters’ fixed 
characteristics (e.g., age, gender) can bias ratings (Schultz & Evans, 2012). Although we only explored 
theoretically malleable characteristics in the current study, fixed characteristics may be useful for researchers 
who wish to approximate true levels of student Social-Emotional Competence. Future studies should also 
examine student-level characteristics to see if dyad-specific variance, or interactions between student and 
teacher characteristics, systematically bias scores. 
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Psychometrically sound assessment tools may facilitate the discovery and implementation of effective 
Social Emotional Learning (SEL) programs. Findings of this study help us unpack student assessment scores 
into their component parts, which may increase our responsible use of behavior rating scales like the DESSA- 
Mini for the rating of student Social-Emotional Competence. Responsible use of such tools in research and 
practice has the potential to facilitate the routine implementation and evaluation of SEL programs to help 
ameliorate mental, emotional, and behavioral problems in young people. 
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